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Abstract. The Kaczmarz method is an iterative algorithm for solving systems 
of linear equations Ax = b. Theoretical convergence rates for this algorithm were 
largely unknown until recently when work was done on a randomized version of 
the algorithm. It was proved that for overdetermined systems, the randomized 
Kaczmarz method converges with expected exponential rate, independent of the 
number of equations in the system. Here we analyze the case where the system 
Ax = b is corrupted by noise, so we consider the system Ax w 6 + r where r is 
an arbitrary error vector. We prove that in this noisy version, the randomized 
method reaches an error threshold dependent on the matrix A with the same rate 
as in the error-free case. We provide examples showing our results are sharp in 
the general context. 



The Kaczmarz method [8j is one of the most popular solvers of overdetermined 
linear systems and has numerous applications from computer tomography to image 
processing. It is an iterative method, and so therefore is practical in the realm of 
very large systems of equations. The algorithm consists of a series of alternating 
projections, and is often considered a type of Projection on Convex Sets (POCS) 
method. Given a consistent system of linear equations of the form 



the Kaczmarz method iteratively projects onto the solution spaces of each equation 
in the system. That is, if ai,...,am G IR" denote the rows of A, the method 
cyclically projects the current estimate orthogonally onto the hyperplanes consisting 
of solutions to (oj, x) = h^. Each iteration consists of a single orthogonal projection. 
The algorithm can thus be described using the recursive relation. 



where Xk is the k^^ iterate and i = {k mod m) + 1. 

Although the Kaczmarz method is popular in practice, theoretical results on the 
convergence rate of the method have been difficult to obtain. Most known estimates 
depend on properties of the matrix A which may be time consuming to compute, and 
are not easily comparable to those of other iterative methods (see e.g. [3], [3]). 
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Since the Kaczmarz method cycles through the rows of A sequentially, its conver- 
gence rate depends on the order of the rows. Intuition tells us that the order of the 
rows of A does not change the difficulty level of the system as a whole, so one would 
hope for results independent of the ordering. One natural way to overcome this is to 
use the rows of A in a random order, rather than sequentially. Several observations 
were made on the improvements of this randomized version [Qj [6] , but only recently 
have theoretical results been obtained [HI [T3] . 

1.1. Randomized Kaczmarz. In designing a random version of the Kaczmarz 
method, it is necessary to set the probability of each row being selected. Strohmer 
and Vershynin propose in [TTl [13] to set the probability proportional to the Euclidean 
norm of the row. Their revised algorithm can then be described by the following: 

_ bp(^i) — {ap(^i),Xk) 

\\^p{i) II 2 

II l|2 

where p{i) takes values in {1, . . . ,m} with probabilities ^ . Here and through- 
out, II A II F denotes the Frobenius norm of A and || ■ II2 denotes the usual Euchdean 
norm or spectral norm for vectors or matrices, respectively. We note here that of 
course, one needs some knowledge of the norm of the rows of A in this version of the 
algorithm. In general, this computation takes 0{mn) time. However, in many cases 
such as the case in which A contains Gaussian entries, this may be approximately 
or exactly known. 

In [HKTSj, Strohmer and Vershynin prove the following exponential bound on the 
expected rate of convergence for the randomized Kaczmarz method, 

(1.1) E||a;fc - a;||2 < (^1 - — j ||a;o-a;||2, 

where R = ||yl~-'^|p|| A|||,, xq is an arbitrary initial estimate, and E denotes the 
expectation (over the choice of the rows). Here and throughout, we will assume 
that A has full column rank so that ||A~^|| = inf{M : M||y4x||2 > ||3;||2 for all x} is 
well defined. We comment here that this particular mixed condition number comes 
as an immediate consequence of the simple probabilities used within the randomized 
algorithm. 

The first remarkable note about this result is that it is essentially independent 
of the number m of equations in the system. Indeed, by the definition of R, R is 
proportional to n within a square factor of n{A), the condition number of A {k,{A) 
is defined as the ratio of the largest to smallest singular values of A). This bound 
also demonstrates, however, that the Kaczmarz method is an efficient alternative to 
other methods only when the condition number is very small. If this is not the case, 
then other alternative methods may offer improvements in practice. 

The bound fll.ip and the relationship of i? to n shows that the estimate Xk con- 
verges exponentially fast to the solution in just 0{n) iterations. Since each iteration 
requires 0(n) time, the method overall has a O(n^) runtime. Being an iterative algo- 
rithm, it is clear that the randomized Kaczmarz method is competitive only for very 
large systems. For such large systems, the runtime of O(n^) is clearly superior to. 
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for example, Gaussian elimination which has a runtime of O(mn^). Also, since the 
algorithm needs only access to the randomly chosen rows of A, the method need not 
know the entire matrix A, which for very large systems is a clear advantage. Thus the 
interesting cases for the randomized method are those in which n and m are large, 
and especially those in which m is extremely large. Strohmer and Vershynin discuss 
in detail in Section 4.2 of [13] cases where the randomized Kaczmarz method even 
outperforms the conjugate gradient method (CGLS). They show that for example, 
randomized Kaczmarz computationally outperforms CGLS for Gaussian matrices 
when m > 3n. Numerical experiments in \L3l also demonstrate advantages of the 
randomized Kaczmarz method in many cases. 

Since the results of [HI [T3], there has been some further discussion about the 
benefits of this randomized version of the Kaczmarz method (see [2[ US])- The 
Kaczmarz method has been studied for over seventy years, and is useful in many 
applications. The notion of selecting the rows randomly in the method has been 
proposed before (see [3 |T[ |6]), and improvements over the standard method were 
observed. However, the work by Strohmer and Vershynin in [111 IS] provides the 
first proof on the rate of convergence. The rate is exponential in expectation and is 
in terms of standard matrix properties. We are not aware of any other Kaczmarz 
method that provably achieves exponential convergence. 

It is important to note that the method of row selection proposed in this version of 
the randomized Kaczmarz method is not optimal, and an example that demonstrates 
this is given in [13]. However, under this selection strategy, the convergence rates 
proven in [HI [13] are optimal, and there are matrices that satisfy the proven bounds 
exactly. The selection strategy in this method was chosen because it often yields 
very good results, allows a provable guarantee of exponential convergence, and is 
computationally efficient. 

Since the algorithm selects rows based on their row norms, it is natural to ask 
whether one can simply scale the rows any way one wishes. Indeed, choosing the rows 
based on their norms is related to the notion of applying a diagonal preconditioner. 
However, since finding the optimal diagonal preconditioner for a system Ax = b is 
itself a task that is often more costly than inverting the entire matrix, we select 
an easier, although not optimal, preconditioner that simply scales by the (square of 
the) row norms. This type of preconditioner yields a balance of computational cost 
and optimality (see [HUn]). The distinction between the effect of an alternative 
diagonal preconditioner on the Kaczmarz method versus the randomized method 
discussed here is important. If the system is multiplied by a diagonal matrix, the 
standard Kaczmarz method will not change, since the angles between all rows do 
not change. However, such a multiplication to the system in our randomized setting 
changes the probabilities of selecting the rows (by definition). It is then not a 
surprise that this will also affect the convergence rate proved for this method (since 
multiplication will affect the value of R in (11. ip ). 

This randomized version of the Kaczmarz method provides clear advantages over 
the standard method in many cases. Using the selection strategy above, Strohmer 
and Vershynin were able to provide a proof for the expected rate of convergence 
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that shows exponential convergence. No such convergence rate for any Kaczmarz 
method has been proven before. These benefits lead one to question whether the 
method works in the more realistic case where the system is corrupted by noise. 
In this paper we provide theoretical and empirical results to suggest that in this 
noisy case the method converges exponentially to the solution within a specified 
error bound. The error bound is proportional to a/R, and we also provide a simple 
example showing this bound is sharp in the general setting. 



2. Main Results 

Theoretical and empirical studies have shown the randomized Kaczmarz algorithm 
to provide very promising results. Here we show that it also performs well in the case 
where the system is corrupted with noise. In this section we consider the consistent 
system Ax = b after an error vector r is added to the right side: 

Ax ^ b + r. 

Note that we do not require the perturbed system to be consistent. First we present 
a simple example to gain intuition about how drastically the noise can affect the 
system. To that end, let A be the n x n identity matrix, 6 = 0, and suppose the 
error is the vector whose entries are all one, r = (1,1,. ..,1). Then the solution to 
the noisy system is clearly a; = r = (l,l,...,l), and the solution to the unperturbed 
problem is a; = 0. By Jensen's inequality, we have 

2 



^E||xfc-r||2j <E^||xfc-r| 



Now considering the noisy problem, we may substitute r for x in (11. ip . Combining 
this with Jensen's inequality above, we obtain 

(2.1) E||xfc-r||2< (l--j ||xo-r||2. 

Then by the triangle inequality, we have 

\\r - x\\2 < \\r - Xk\\2 + \\xk - x\\2. 
Next, by taking expectation and using (12. ip above, we have 

, / ix'^/^,, 

- x\\2 > \\r - x\\2 ~ \}~ ^) Ifo - r\\2- 
Finally by the definition of r and R, this implies 

E\\xk-x\\2 > VR - (l - 

This means that the limiting error between the iterates Xk and the original solution 
X is \/R. In [m [13] it is shown that the bound provided in (II. ip is optimal, so even 
this trivial example demonstrates that if we wish to maintain a general setting, the 
best error bound for the noisy case we can hope for is proportional to \/R. Our 
main result proves this exact theoretical bound. 



l\fc/2 

\\xo-r\\2. 
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Theorem 2.1 (Noisy randomized Kaczmarz). Let A have full column rank and 
assume the system Ax = b is consistent. Let xl be the k^^ iterate of the noisy 
randomized Kaczmarz method run with Ax ^ b + r, and let ai, . . . denote the 
rows of A. Then we have 

nK-xh<{l--) ||a;o||2 + v^7, 

where R = 7 = maxj j^^, cind the expectation is taken over the choice 

of the rows in the algorithm. 

Remark. In the case discussed above, note that we have 7 = 1, so the example 
indeed shows the bound is sharp. 

One may also recall the bound from perturbation theory (see e.g. [7]) on the 
relative error in the perturbed case. If we let x = A^{x + r) (where A^ = {A*A)-^A* 
denotes the left inverse of A), then 

\\x — xllo , llrlU 



By applying the bound y/R < K{A)y/n to Theorem 12.11 above, we obtain the bound 

Ik -^Ih , . ,^ Vn\ri\ 
< nyAj max 



These bounds look similar in spirit, providing some more reassurance to the sharp- 
ness of the error bound. It is important to note though that the first is obtained by 
applying the left inverse rather than an iterative method, which explains why the 
bounds are not exactly equal. Of course for problems of large sizes, applying the 
inverse may not even be computationally feasible. 

Before proving the theorem, it is important to first analyze what happens to the 
solution spaces of the original equations Ax = b when the error vector is added. 
Letting oi, . . . am denote the rows of A, we have that each solution space (oj, x) = bi 
of the original system is a hyperplane whose normal is -ir^V. When noise is added, 

11*^* II 2 

each hyperplane is translated in the direction of Oj. Thus the new geometry consists 
of hyperplanes parallel to those in the noiseless case. A simple computation provides 
the following lemma which specifies exactly how far each hyperplane is shifted. 

Lemma 2.2. Let Hi be the affine subspaces o/M" consisting of the solutions to the 
unperturbed equations, Hi = {x : {ai,x) = bi}. Let H* be the solution spaces of the 
noisy equations, H* = {x : (a^, x) = bi + rj}. Then H* = {w + aitti : w G Hi} where 



Remark. Note that this lemma does not imply that the noisy system is consistent. 
By definition of H* it is clear that each subspace is non-empty, but we are not 
requiring that the intersection of all H* be non-empty. 

Proof. First, if w E Hi then (oj, w + aai) = {ai, w)+a\\ai\\l = bi+ri, so w+aai G H*. 
Next let u G H*. Set w = u — aai. Then (oj, w) = {ai, u) — ri = bi + ri — ri = bi, so 
w G H*. This completes the proof. □ 
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We will also utilize the following lemma which is proved in the proof of Theorem 
2 in dUE]. 

Lemma 2.3. Let be any vector in M" and let Xk be its orthogonal projection 
onto a random solution space as in the noiseless randomized Kaczmarz method run 
with Ax = b. Then we have 

Ellxfc - x\\l < (l - 114-1 - ^Wl, 

where R = ||y4~^|p||y4|||., and the expectation is taken over the choice of the rows in 
the algorithm. 

We are now prepared to prove Theorem 12.11 

of Theorem \2.1[ Let denote the {k — iy^ iterate of noisy randomized Kaczmarz. 
Using notation as in Lemma 12. 2[ let H* be the solution space chosen in the k^^ 
iteration. Then xl is the orthogonal projection of xl_i onto H*. Let Xk denote the 
orthogonal projection of xl_-j^ onto Hi (see Figure [I]). 




Figure 1. The parallel hyperplanes Hi and H* along with the two 
projected vectors Xk and xl- 

By Lemma [22] and the fact that Oj is orthogonal to Hi and H*, we have that x^ — 
X = Xk — x + aiGi. Again by orthogonality, we have \\xl —x\\2 = \\xk — + ||ajaj||2. 
Then by Lemma [2.31 and the definition of 7, we have 



where the expectation is conditioned upon the choice of the random selections in 
the first k — 1 iterations. Then applying this recursive relation iteratively and taking 
full expectation, we have 



nK - < (1 - ^Yw^o - x\\i + 



j=0 

^ " '|2 I D,,2 



R 



1' 



< \\xQ~x\\i + RY- 



RANDOMIZED KACZMARZ SOLVER FOR NOISY LINEAR SYSTEMS 



7 



By Jensen's inequality we then have 

E\\xl-x\\2< {[1--J \\xo-x\\l + Rj'^j <(^l--j ll^o -a;||2 + Vi?7- 
This completes the proof. □ 

3. Numerical Examples 

In this section we describe some of our numerical results for the randomized 
Kaczmarz method in the case of noisy systems. Figure [2] depicts the error between 
the estimate by randomized Kaczmarz and the actual signal, in comparison with 
the predicted threshold value for several types of matrices. The first study was 
conducted for 100 trials using 2000 x 100 Gaussian matrices (matrices who entries 
are i.i.d. Gaussian with mean and variance 1) and independent Gaussian noise of 
norm 0.02. The systems were homogeneous, meaning a; = and 6 = 0. The thick line 
is a plot of the threshold value, 7a/R for each trial. The thin line is a plot of the error 
in the estimate after the given amount of iterations for the corresponding trial. The 
scatter plot displays the convergence of the method over several randomly chosen 
trials from this study, and clearly shows exponential convergence. The second study 
is a similar study but for the experiments in which we used partial Fourier matrices. 
In this case we use m = 700 and n = 101. For j = 1 . . . 700 and k = —50 . . . 50, we set 
y4j_fc = exp(27rz/ctj), where tj are generated uniformly at random on [0, 1]. This type 
of generation is used to create nonuniformly spaced sampling values, and is used in 
many applications in signal processing, such as in the reconstruction of bandlimited 
signals. The third study is similar but used matrices whose entries are Bernoulli (0/1 
each with probability 0.5). All of these experiments were conducted to demonstrate 
that the error found in practice is close to that predicted by the theoretical results. 
As is evident by the plots, the error is quite close to the threshold in all cases. 

Acknowledgment. I would like to thank Roman Vershynin for suggestions that 
simplified the proofs and for many thoughtful discussions. I would also like to thank 
Thomas Strohmer for his very appreciated guidance. 
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Figure 2. The comparison between the actual error in the random- 
ized Kaczmarz estimate (thin hne) and the predicted threshold (thick 
line). The mean values of R in these experiments were 163.2 (upper 
left), 428.6 (lower left) and 162.4 (lower right). The scatter plot shows 
exponential convergence over several trials. 



References 

C. Ccnkcr, H. G. Feichtinger, M. Mayer, H. Steier, and T. Strohmer. New variants of the poos 
method using affine subspaces of finite codimension, with apphcations to irregular samphng. 
Proc. SPIE: Visual Communications and Image Processing, pages 299-310, 1992. 
Y. Censor, G.T. Herman, and M. Jiang. A note on the behavior of the randomized Kaczmarz 
algorithm of Strohmer and Vershynin. J. Fourier Anal. Appl., 15:431-436, 2009. 

F. Deutsch and H. Hundal. The rate of convergence for the method of alternating projections. 
J. Math. Anal. AppL, 205(2):381-405, 1997. 

A. Galantai. On the rate of convergence of the alternating projection method in finite dimen- 
sional spaces. J. Math. Anal. AppL, 310(l):30-44, 2005. 

M. Hanke and W. Niethammer. On the acceleration of Kaczmarz's method for inconsistent 
linear systems. Linear Alg. AppL, 130:83-98, 1990. 

G. T. Herman and L.B. Meyer. Algebraic reconstruction techniques can be made computation- 
ally efficient. IEEE Transactions on Medical Imaging, 12(3):600-609, 1993. 

R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge Univ. Press, 1985. 

S. Kaczmarz. Angenaherte auflosung von systemen linearer gleichungen. BulL Internat. Acad. 

Polon.Sci. Lettres A, pages 335-357, 1937. 

F. Natterer. The Mathematics of Computerized Tomography. Wiley, New York, 1986. 
A. Shapiro. Upper bounds for nearly optimal diagonal scaling of matrices. Linear and Multi- 
linear Algebra, 29:145-147, 1991. 



RANDOMIZED KACZMARZ SOLVER FOR NOISY LINEAR SYSTEMS 



9 



[11] T. Strohmcr and R. Vershynin. A randomized solver for linear systems with exponential con- 
vergence. In RANDOM 2006 (10th International Workshop on Randomization and Computa- 
tion), number 4110 in Lecture Notes in Computer Science, pages 499-507. Springer, 2006. 

[12] T. Strohmer and R. Vershynin. Comments on the randomized Kaczmarz method. J. Fourier 
Anal. Appl., 15:437-440, 2009. 

[13] T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential conver- 
gence. J. Fourier Anal. Appl., 15:262-278, 2009. 

[14] A. van der Sluis. Condition numbers and equilibration of matrices. Numer. Math., 14:14-23, 
1969. 



